This document explores a dataset of a survey of students' skills and knowledge as they approach the end of compulsory education. It is not a conventional school test. Rather than examining how well students have learned the school curriculum, it looks at how well prepared they are for life beyond school.
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
%matplotlib inline
# Load dataset
pisa_df=pd.read_csv("pisa2012.csv", encoding='latin-1')
Let's divide data into tables to use it easier
pisa_df.sample(5)
# high-level overview of data shape and composition
print(pisa_df.shape)
pisa_df.dtypes.head(20)
# descriptive statistics for numeric variables
pisa_df.describe()
At International Grade(ST01Q01) average students score 9.8, median and 75& is 10 and max score 96 which seems to be an outlier
At International Grade(ST02Q01) max score 25 which absolutely an outlier as max is 6.
Any grade more than 6 is an outlier and we can replace it with the average
Any Nan replace it with average also
(pisa_df.ST02Q01.isna().sum()+(pisa_df.ST02Q01>6).sum())/pisa_df.shape[0]*100
Since < 10% we can remove them
pisa_df.ST02Q01=pisa_df.ST02Q01.dropna()
pisa_df=pisa_df[pisa_df.ST02Q01 <= 6]
pisa_df.ST02Q01.unique()
Let's Test
pisa_df.ST02Q01.isna().sum()
pisa_df[pisa_df.ST02Q01>6]
Let's examine each variable alone
# How many countries participated in 2012 pisa
pisa_df.CNT.nunique()
# How many National centers that the test happen in them
pisa_df.NC.nunique()
# How many countries participate in the test but in a NC of anothr country
pisa_df[~pisa_df.CNT.isin(pisa_df.NC)].CNT.nunique()
There are 37 countries that have no NC
# How many null rows in the data
pisa_df.isnull().sum()
It is hard to understand any thing from this data let's divide it to tables
# Here we will explore databased on the mindset of the student
agree_disagree_col=["ST02Q01", "ST29Q03", "ST29Q04", "ST29Q05", "ST29Q06", "ST29Q07", "ST29Q08", "ST35Q01", "ST35Q02",
"ST35Q03", "ST35Q04", "ST35Q05", "ST35Q06", "ST42Q01", "ST42Q02", "ST42Q03", "ST42Q04", "ST42Q05",
"ST42Q06", "ST42Q07", "ST42Q08", "ST42Q09", "ST42Q10", "ST43Q01", "ST43Q02", "ST43Q03", "ST43Q04",
"ST43Q05", "ST43Q06", "ST46Q01", "ST46Q02", "ST46Q03", "ST46Q04", "ST46Q05", "ST46Q06", "ST46Q07",
"ST46Q08", "ST46Q09", "ST86Q01", "ST86Q02", "ST86Q03", "ST86Q04", "ST86Q05", "ST87Q01", "ST87Q02",
"ST87Q03", "ST87Q04", "ST87Q05", "ST87Q06", "ST87Q07", "ST87Q08", "ST87Q09", "ST88Q01", "ST88Q02",
"ST88Q03", "ST88Q04", "ST89Q02", "ST89Q03", "ST89Q04", "ST89Q05", "ST91Q01", "ST91Q02", "ST91Q03",
"ST91Q04", "ST91Q05", "ST91Q06", "IC22Q01", "IC22Q02", "IC22Q04", "IC22Q06", "IC22Q07", "IC22Q08",
"ST84Q01", "ST84Q02", "ST84Q02", "ST85Q01"]
agree_disagree_df=pisa_df[agree_disagree_col]
# Here we will see the teacher roles to enhance the student
teacher_roles=["ST02Q01","ST77Q01", "ST77Q02", "ST77Q04", "ST77Q05", "ST77Q06", "ST79Q01", "ST79Q02", "ST79Q05", "ST79Q06",
"ST79Q08","ST79Q11", "ST79Q12", "ST79Q15", "ST79Q17", "ST80Q01", "ST80Q04", "ST80Q05", "ST80Q06", "ST80Q07",
"ST80Q08", "ST80Q09", "ST80Q10", "ST80Q11", "ST82Q01", "ST82Q02", "ST82Q03", "ST83Q01", "ST83Q02", "ST83Q03",
"ST83Q04", "ST85Q02", "ST85Q03", "ST85Q04"]
teacher_roles_df=pisa_df[teacher_roles]
# Here we will see the student roles and what make each one different from others
student_roles=["ST02Q01","ST49Q01", "ST49Q02", "ST49Q03", "ST49Q04", "ST49Q05", "ST49Q06", "ST49Q07", "ST49Q09",
"ST53Q01", "ST53Q02", "ST53Q03", "ST53Q04", "ST55Q01", "ST55Q02", "ST55Q03", "ST55Q04", "ST57Q01",
"ST57Q02", "ST57Q03", "ST57Q04", "ST57Q05", "ST57Q06", "ST79Q03", "ST79Q04", "ST79Q07", "ST79Q10",
"ST81Q01", "ST81Q02", "ST81Q03", "ST81Q04", "ST81Q05", "ST44Q01", "ST44Q03", "ST44Q04", "ST44Q05",
"ST44Q07", "ST44Q08", "ST37Q01", "ST37Q02", "ST37Q03", "ST37Q04", "ST37Q05", "ST37Q06", "ST37Q07",
"ST37Q08", "IC09Q02", "IC09Q03", "IC09Q07", "EC03Q01", "EC03Q02", "EC03Q03", "EC03Q04", "EC03Q05",
"EC03Q06", "EC03Q07", "ST96Q02", "ST96Q03", "ST96Q05", "ST101Q01", "ST101Q02", "ST101Q03", "ST101Q05",
"ST104Q01", "ST104Q04", "ST104Q05", "ST104Q06"]
student_roles_df=pisa_df[student_roles]
# Here we will see the whether the facilities affects the grades or not
facilities=["ST02Q01", "IC01Q01", "IC01Q02", "IC01Q03", "IC01Q04", "IC01Q05", "IC01Q06", "IC01Q07", "IC01Q08",
"IC01Q09", "IC01Q10", "IC01Q11", "IC02Q01", "IC02Q02", "IC02Q03", "IC02Q04", "IC02Q05", "IC02Q06",
"IC02Q07", "ST26Q01", "ST26Q02", "ST26Q03", "ST27Q01", "ST27Q02", "ST27Q03", "ST27Q04", "ST27Q05",
"ST28Q01", "ST29Q01", "ST29Q02"
]
facilities_df=pisa_df[facilities]
# How many duplicated rows in the data
pisa_df.duplicated().sum()
Quality
Tidy
Dependent variables ==> ST02Q01(pisa grade 1-->6)
There are 485490 students in the dataset with 636 you can take a look at them at "pisadict2012.csv"
The survey was branched to different branches or questions. We are interested in ==> agree or disagree, Attributions to Failure, Math Behaviour, Learning Strategies, Teacher and there roles, the student and his mindset and Facilities at home and school.
We have some ordinal data as agree or disagree (Strongly agree > Agree > Disagree > Strongly disagree)
Most variables are numeric in nature about 268.
I'm most interested in figuring out what features are best for studying and what the superior students do.
I think teacher and student's mindset have a great impact on the student and his grade, studying hour +ve relation with grade.
# copy the df
agree_disagree_df_clean=agree_disagree_df.copy()
teacher_roles_df_clean=teacher_roles_df.copy()
student_roles_df_clean=student_roles_df.copy()
facilities_df_clean=facilities_df.copy()
the columns in agree_disagree_df_clean doesn't representatives
# renaming the columns
agree_disagree_df_clean=agree_disagree_df_clean.rename(columns={
"ST02Q01":"grade", "ST29Q03":"look_forward_to_lessons", "ST29Q04":"enjoy_maths", "ST29Q05":"worthwhile_for_career_chances ",
"ST29Q06":"interested_in_math", "ST29Q07":"important_for_future_study", "ST29Q08":"helps_to_get_a_job",
"ST35Q01":"friends_do_well_in_mathematics","ST35Q02":"friends_work_hard_on_mathematics","ST35Q03":"friends_enjoy_mathematics_tests",
"ST35Q04":"parents_believe_studying_mathematics_is_important", "ST35Q05":"parents_believe_mathematics_is_important_for_career",
"ST35Q06":"parents_like_mathematics", "ST42Q01":"worry_that_it_will_be_difficult", "ST42Q02":"believe_not_good_at_maths",
"ST42Q03":"very_tense_math", "ST42Q04":'believe_good_grades_in_math', "ST42Q05":"very_nervous_math","ST42Q06":"learn_quickly_math",
"ST42Q07":"math_one_of_best_subjects", "ST42Q08":"feel_helpless", "ST42Q09":"understand_difficult_work", "ST42Q10":"worry_about_getting_poor_grades",
"ST43Q01":"can_succeed_with_enough_effort", "ST43Q02":"doing_well_is_completely_up_to_me", "ST43Q03":"family_demands_and_problems",
"ST43Q04":"different_teachers","ST43Q05":"if_i_wanted_i_could_perform_well", "ST43Q06":"perform_poorly_regardless",
"ST46Q01":"homework_completed_in_time", "ST46Q02":"work_hard_on_homework", "ST46Q03":"prepared_for_exams", "ST46Q04":"study_hard_for_quizzes",
"ST46Q05":"study_until_i_understand_everything", "ST46Q06":"pay_attention_in_classes", "ST46Q07":"listen_in_classes",
"ST46Q08":"avoid_distractions_when_studying", "ST46Q09":"keep_work_organized", "ST86Q01":"get_along_with_teachers",
"ST86Q02":"teachers_are_interested", "ST86Q03":"teachers_listen_to_students", "ST86Q04":"teachers_help_students",
"ST86Q05":"teachers_treat_students_fair", "ST87Q01":"feel_like_outsider", "ST87Q02":"make_friends_easily",
"ST87Q03":"belong_at_school", "ST87Q04":"feel_awkward_at_school", "ST87Q05":"liked_by_other_students",
"ST87Q06":"feel_lonely_at_school", "ST87Q07":"feel_happy_at_school", "ST87Q08":"things_are_ideal_at_school",
"ST87Q09":"satisfied_at_school", "ST88Q01":"does_little_to_prepare_me_for_life", "ST88Q02":"waste_of_time",
"ST88Q03":"gave_me_confidence", "ST88Q04":"useful_for_job", "ST89Q02":"sc_helps_to_get_a_job", "ST89Q03":"prepare_for_college",
"ST89Q04":"enjoy_good_grades", "ST89Q05":"trying_hard_is_important", "ST91Q01":"can_succeed_with_enough_effort_school",
"ST91Q02":"my_choice_whether_i_will_be_good", "ST91Q03":"problems_prevent_from_putting_effort_into_school",
"ST91Q04":"different_teachers_would_make_me_try_harder", "ST91Q05":"could_perform_well_if_i_wanted",
"ST91Q06":"perform_poor_regardless", "IC22Q01":"useful_for_schoolwork", "IC22Q02":"homework_more_fun",
"IC22Q04":"source_of_information", "IC22Q06":"troublesome", "IC22Q07":"not_suitable_for_schoolwork", "IC22Q08":"too_unreliable",
"ST84Q01":"students_frequently_interrupt/teacher_arrives_early", "ST84Q02":"students_are_calm/teacher_arrives_on_time",
"ST84Q03":"students_frequently_Interrupt/teacher_Arrives_late", "ST85Q01":"students_listen"
})
agree_disagree_df_clean.info()
the columns in teacher_roles_df_clean doesn't representatives
# The list that contains the new name of the columns in order
teacher_col=["grade", "shows_interest ", "extra_help", "helps", "continues", "express_opinions", "sets_clear_goals",
"encourages_thinking_and_reasoning", "gives_feedback", "checks_understanding", "summarizes_previous_lessons",
"gives_feedback_on_strengths_and_weaknesses", "informs_about_expectations", "informs_about_learning_goals",
"tells_how_to_get_better", "teacher_encourages_to_reflect_problems", "gives_problems_that_require_to_think",
"asks_to_use_own_procedures", "presents_problems_with_no_obvious_solutions", "presents_problems_in_different_contexts",
"helps_learn_from_mistakes", "asks_for_explanations", "apply_what_we_learned", "problems_with_multiple_solutions",
"hw_every_day_back_in_time", "hw_once_a_week_back_in_time", "hw_once_a_week_not_Back_in_time",
"lets_us_know_we_have_to_work_hard", "provides_extra_help_when_needed", "helps_students_with_learning",
"gives_opportunity_to_express_opinions", "keeps_class_orderly", "starts_on_time", "wait_long_to_quiet_down"]
teacher_dict=dict()
for old, new in zip(teacher_roles ,teacher_col):
teacher_dict[old]=new
teacher_roles_df_clean=teacher_roles_df_clean.rename(columns=teacher_dict)
teacher_roles_df_clean.info()
the columns in student_roles_df_clean doesn't representatives
# The list that contains the new name of the columns in order
# p==>what did he do when facing a problem
student_col=["grade", "talk_about_maths_with_friends", "help_friends_with_maths", "extracurricular_activity",
"participate_in_competitions", "study_more_than_2_extra_hours_a_day", "play_chess", "computer_programming",
"participate_in_math_club", "important_parts_vs._existing_knowledge_vs._learn_by_heart",
"improve_understanding_vs._new_ways_vs._memory", "other_subjects_vs._learning_goals_vs._rehearse_problems",
"repeat_examples_vs._everyday_applications_vs._more_information", "test_lang", "maths", "science", "other",
"homework", "guided_homework", "personal_tutor", "commercial_company", "with_parent", "computer",
"differentiates_between_students_when_giving_tasks", "assigns_complex_projects",
"has_students_work_in_small_groups", "plans_classroom_activities","students_don’t_listen", "noise_and_disorder",
"teacher_has_to_wait_until_its_quiet", "students_don’t_work_well", "students_start_working_late",
"af_not_good_at_maths_problems", "af_teacher_did_not_explain_well", "af_bad_guesses", "af_material_too_hard",
"af_teacher_did'nt_get_students_interested", "af_unlucky", "cf_using_a_train_timetable", "cf_calculating_tV_discount",
"cf_calculating_square_metres_of_tiles", "cf_understanding_graphs_in_newspapers", "cf_solving_equation_1",
"cf_distance_to_scale", "cf_solving_equation_2", "cf_calculate_petrol_consumption_rate", "email_students",
"email_teachers", "share_school_material","internship", "work_site_visits", "job_fair", "career_advisor_at_school",
"career_advisor_outside_school", "questionnaire", "internet_search", "p_trace_steps", "Manual", "ask_a_friend",
"read_brochure", "study_map", "leave_it_to_brother", "just_drive", "similarities", "try_buttons", "ask_for_help",
"find_ticket_office"]
student_dict=dict()
for old, new in zip(student_roles ,student_col):
student_dict[old]=new
student_roles_df_clean=student_roles_df_clean.rename(columns=student_dict)
student_roles_df_clean.info()
the columns in facilities_df_clean doesn't representatives
# The list that contains the new name of the columns in order
# h ==> Home , s ==> school, N ==> Number
facilities_col=["grade", "h_pc", "h_laptop", "h_tablet", "h_internet", "h_video_games", "h_cell_phone_w/o_internet",
"h_cell_phone_with_internet", "h_mp3/mp4_player", "h_printer", "h_usb_(memory)_stick", "h_ebook_reader",
"s_pc", "s_laptop", "s_tablet", "s_internet", "s_printer", "s_usb_(memory)_stick", "s_ebook_reader",
"desk", "own_room", "study_place", "N_cellular_phones", "N_televisions", "N_computers", "N_cars",
"N_rooms_bath_or_shower", "N_books_at_home", "enjoy_Reading", "worthwhile_for_work"]
facilities_dict=dict()
for old, new in zip(facilities ,facilities_col):
facilities_dict[old]=new
facilities_df_clean=facilities_df_clean.rename(columns=facilities_dict)
facilities_df_clean.info()
Lets assess the data once more
Let's start with the first table agree_disagree_df_clean
# check for nulls in columns
agree_disagree_df_clean.isnull().sum()
# Show the descriptive statistics
agree_disagree_df_clean.describe()
# Show the descriptive statistics visually
agree_disagree_df_clean.describe()[1:].plot(kind="bar", legend=False);
According to this discuss about how to deal with null categorical data I will Treat missing data as just another category
Let's assess the second table teacher_roles_df_clean
# check for nulls in columns
teacher_roles_df_clean.isnull().sum()
# Show the descriptive statistics
teacher_roles_df_clean.describe()
Let's assess the next table student_roles_df_clean
# check for nulls in columns
student_roles_df_clean.isnull().sum()
# Show the descriptive statistics
student_roles_df_clean.describe()
We will every null in the numeric data with its average
for col in student_roles_df_clean.describe().columns:
mask = student_roles_df_clean[col].isna()
student_roles_df_clean.loc[mask,col]=np.ceil(student_roles_df_clean[col].mean())
Let's assess the last table facilities_df_clean
# check for nulls in columns
facilities_df_clean.isnull().sum()
# Show the descriptive statistics
facilities_df_clean.describe()
# Let's see the distribution of the grade
pisa_df.ST02Q01.hist(rwidth=0.9);
plt.xlabel("Grade");
Most popular grade is 1 and least one is 6
The grade will be less popular as it increases
# convert the degree of agree columns in the table into ordered categorical types
student_ideas=["worry_that_it_will_be_difficult", "believe_not_good_at_maths", "very_tense_math", "believe_good_grades_in_math",
"very_nervous_math", "learn_quickly_math", "math_one_of_best_subjects", "feel_helpless", "understand_difficult_work",
"worry_about_getting_poor_grades", "can_succeed_with_enough_effort", "doing_well_is_completely_up_to_me",
"family_demands_and_problems", "different_teachers", "if_i_wanted_i_could_perform_well", "perform_poorly_regardless"]
ordinal_var_dict = {}
for col in student_ideas:
ordinal_var_dict[col]=["Strongly disagree", "Disagree", "Agree", "Strongly agree"]
for var in ordinal_var_dict:
ordered_var = pd.api.types.CategoricalDtype(ordered = True,
categories = ordinal_var_dict[var])
agree_disagree_df_clean[var] = agree_disagree_df_clean[var].astype(ordered_var)
# Let's test
agree_disagree_df_clean.worry_that_it_will_be_difficult.unique()
# Let's explore student idea about subjects
nrows=int(np.ceil(len(student_ideas)/2))
ncols=2
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize = [20,40])
default_color = sb.color_palette()[0]
for i in range(len(student_ideas)):#enumerate(student_ideas):
sb.countplot(data = agree_disagree_df_clean, x = student_ideas[i], color = default_color, ax = ax[i//ncols,i%ncols])
plt.show()
Many student afraid of the exam, worry that it will be difficult and worry about poor grades although they are confident and believe good grades in math, believe if they want they could perform well, believe that doing well is completely up to him, and can succeed with enough effort
Students aren't nervous or very tense, doesn't think that they perform poorly regardless
Most student doesn't like math or it is just not their best subject
# convert the degree of agree columns in the table into ordered categorical types
student_effort=["homework_completed_in_time", "work_hard_on_homework", "prepared_for_exams", "study_hard_for_quizzes",
"study_until_i_understand_everything", "pay_attention_in_classes", "listen_in_classes",
"avoid_distractions_when_studying", "keep_work_organized"]
ordinal_var_dict = {}
for col in student_effort:
ordinal_var_dict[col]=["Strongly disagree", "Disagree", "Agree", "Strongly agree"]
for var in ordinal_var_dict:
ordered_var = pd.api.types.CategoricalDtype(ordered = True,
categories = ordinal_var_dict[var])
agree_disagree_df_clean[var] = agree_disagree_df_clean[var].astype(ordered_var)
# Let's explore student effort about subjects
nrows=int(np.ceil(len(student_effort)/2))
ncols=2
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize = [20,40])
default_color = sb.color_palette()[0]
for i in range(len(student_effort)):
sb.countplot(data = agree_disagree_df_clean, x = student_effort[i], color = default_color, ax = ax[i//ncols,i%ncols])
plt.show()
It seems that the students in this data are disciplined or they believe that, we need to know whether that affects the grades or not and which factor or effort has the highest impact.
# convert the degree of agree columns in the table into ordered categorical types
student_relation=["get_along_with_teachers", "teachers_are_interested", "teachers_listen_to_students",
"teachers_help_students", "teachers_treat_students_fair", "feel_awkward_at_school", "liked_by_other_students",
"feel_like_outsider", "make_friends_easily", "belong_at_school",
"feel_lonely_at_school", "feel_happy_at_school", "things_are_ideal_at_school", "satisfied_at_school"]
ordinal_var_dict = {}
for col in student_relation:
ordinal_var_dict[col]=["Strongly disagree", "Disagree", "Agree", "Strongly agree"]
for var in ordinal_var_dict:
ordered_var = pd.api.types.CategoricalDtype(ordered = True,
categories = ordinal_var_dict[var])
agree_disagree_df_clean[var] = agree_disagree_df_clean[var].astype(ordered_var)
# Let's explore student effort about subjects
nrows=int(np.ceil(len(student_relation)/2))
ncols=2
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize = [20,40])
default_color = sb.color_palette()[0]
for i in range(len(student_relation)):
sb.countplot(data = agree_disagree_df_clean, x = student_relation[i], color = default_color, ax = ax[i//ncols,i%ncols])
plt.show()
It seems that the students in this data are social -have a good relation with their friends and teachers-, feel to be belonged to the school or they believe that
# convert the degree of agree columns in the table into ordered categorical types
student_sch_thought_attitude=["does_little_to_prepare_me_for_life", "waste_of_time", "gave_me_confidence", "useful_for_job",
"sc_helps_to_get_a_job", "prepare_for_college", "enjoy_good_grades", "trying_hard_is_important",
"can_succeed_with_enough_effort_school", "my_choice_whether_i_will_be_good",
"problems_prevent_from_putting_effort_into_school", "different_teachers_would_make_me_try_harder",
"could_perform_well_if_i_wanted", "perform_poor_regardless", "useful_for_schoolwork",
"homework_more_fun", "source_of_information", "troublesome", "not_suitable_for_schoolwork",
"too_unreliable"]
ordinal_var_dict = {}
for col in student_sch_thought_attitude:
ordinal_var_dict[col]=["Strongly disagree", "Disagree", "Agree", "Strongly agree"]
for var in ordinal_var_dict:
ordered_var = pd.api.types.CategoricalDtype(ordered = True,
categories = ordinal_var_dict[var])
agree_disagree_df_clean[var] = agree_disagree_df_clean[var].astype(ordered_var)
# Let's explore student student_school thought and attitude
nrows=int(np.ceil(len(student_sch_thought_attitude)/2))
ncols=2
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize = [20,40])
default_color = sb.color_palette()[0]
for i in range(len(student_sch_thought_attitude)):
sb.countplot(data = agree_disagree_df_clean, x = student_sch_thought_attitude[i],
color = default_color, ax = ax[i//ncols,i%ncols])
plt.show()
Students thought that school doesn't waste of time, gave confidence, not only helps to get a job but also useful for job, prepare for college, his choice whether to be good, source of info
# convert the degree of agree columns in the table into ordered categorical types
teacher_instructions=["sets_clear_goals", "encourages_thinking_and_reasoning", "gives_feedback", "checks_understanding",
"summarizes_previous_lessons", "gives_feedback_on_strengths_and_weaknesses", "informs_about_expectations",
"informs_about_learning_goals", "tells_how_to_get_better",]
ordinal_var_dict = {}
for col in teacher_instructions:
ordinal_var_dict[col]=['Never or Hardly Ever', 'Some Lessons', 'Most Lessons','Every Lesson']
for var in ordinal_var_dict:
ordered_var = pd.api.types.CategoricalDtype(ordered = True,
categories = ordinal_var_dict[var])
teacher_roles_df_clean[var] = teacher_roles_df_clean[var].astype(ordered_var)
# Let's explore teacher instructions
nrows=int(np.ceil(len(teacher_instructions)/2))
ncols=2
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize = [20,20])
default_color = sb.color_palette()[0]
for i in range(len(teacher_instructions)):
sb.countplot(data = teacher_roles_df_clean, x = teacher_instructions[i],
color = default_color, ax = ax[i//ncols,i%ncols])
plt.show()
Teachers helps the students
# convert the degree of agree columns in the table into ordered categorical types
teacher_role1=["teacher_encourages_to_reflect_problems", "gives_problems_that_require_to_think", "asks_to_use_own_procedures",
"presents_problems_with_no_obvious_solutions", "presents_problems_in_different_contexts",
"helps_learn_from_mistakes", "asks_for_explanations", "apply_what_we_learned", "problems_with_multiple_solutions"]
ordinal_var_dict = {}
for col in teacher_role1:
ordinal_var_dict[col]=['Never or rarely', 'Sometimes', 'Often','Always or almost always']
for var in ordinal_var_dict:
ordered_var = pd.api.types.CategoricalDtype(ordered = True,
categories = ordinal_var_dict[var])
teacher_roles_df_clean[var] = teacher_roles_df_clean[var].astype(ordered_var)
# Let's explore teacher instructions
nrows=int(np.ceil(len(teacher_role1)/2))
ncols=2
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize = [20,40])
default_color = sb.color_palette()[0]
for i in range(len(teacher_role1)):
sb.countplot(data = teacher_roles_df_clean, x = teacher_role1[i],
color = default_color, ax = ax[i//ncols,i%ncols])
plt.show()
teachers in this data are studious
# convert the degree of agree columns in the table into ordered categorical types
teacher_support=["hw_every_day_back_in_time", "hw_once_a_week_back_in_time", "hw_once_a_week_not_Back_in_time",
"lets_us_know_we_have_to_work_hard", "provides_extra_help_when_needed", "helps_students_with_learning",
"gives_opportunity_to_express_opinions", "keeps_class_orderly", "starts_on_time", "wait_long_to_quiet_down"]
ordinal_var_dict = {}
for col in teacher_support:
ordinal_var_dict[col]=["Strongly disagree", "Disagree", "Agree", "Strongly agree"]
for var in ordinal_var_dict:
ordered_var = pd.api.types.CategoricalDtype(ordered = True,
categories = ordinal_var_dict[var])
teacher_roles_df_clean[var] = teacher_roles_df_clean[var].astype(ordered_var)
# Let's explore teacher support
nrows=int(np.ceil(len(teacher_support)/2))
ncols=2
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize = [20,20])
default_color = sb.color_palette()[0]
for i in range(len(teacher_support)):
sb.countplot(data = teacher_roles_df_clean, x = teacher_support[i],
color = default_color, ax = ax[i//ncols,i%ncols])
plt.show()
Again the teachers and students in this data are studious
# convert the degree of agree columns in the table into ordered categorical types
student_confidence=["cf_using_a_train_timetable", "cf_calculating_tV_discount", "cf_calculating_square_metres_of_tiles",
"cf_understanding_graphs_in_newspapers", "cf_solving_equation_1", "cf_distance_to_scale",
"cf_solving_equation_2", "cf_calculate_petrol_consumption_rate"]
ordinal_var_dict = {}
for col in student_confidence:
ordinal_var_dict[col]=["Not at all confident", "Not very confident", "Confident","Very confident"]
for var in ordinal_var_dict:
ordered_var = pd.api.types.CategoricalDtype(ordered = True,
categories = ordinal_var_dict[var])
student_roles_df_clean[var] = student_roles_df_clean[var].astype(ordered_var)
# Let's explore student_confidence
nrows=int(np.ceil(len(student_confidence)/2))
ncols=2
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize = [20,20])
default_color = sb.color_palette()[0]
for i in range(len(student_confidence)):
sb.countplot(data = student_roles_df_clean, x = student_confidence[i],
color = default_color, ax = ax[i//ncols,i%ncols])
plt.show()
Students mostly are confident
# convert the degree of agree columns in the table into ordered categorical types
student_attribution_failure=["af_not_good_at_maths_problems", "af_teacher_did_not_explain_well", "af_bad_guesses",
"af_material_too_hard", "af_teacher_did'nt_get_students_interested", "af_unlucky"]
ordinal_var_dict = {}
for col in student_attribution_failure:
ordinal_var_dict[col]=[ "Not at all likely", "Slightly likely", "Likely","Very Likely"]
for var in ordinal_var_dict:
ordered_var = pd.api.types.CategoricalDtype(ordered = True,
categories = ordinal_var_dict[var])
student_roles_df_clean[var] = student_roles_df_clean[var].astype(ordered_var)
# Let's explore teacher instructions
nrows=int(np.ceil(len(student_attribution_failure)/2))
ncols=2
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize = [20,10])
default_color = sb.color_palette()[0]
for i in range(len(student_attribution_failure)):
sb.countplot(data = student_roles_df_clean, x = student_attribution_failure[i],
color = default_color, ax = ax[i//ncols,i%ncols])
plt.show()
The main reasons of bad grades that many students agree on it are not good at the subject(math) and the material is too hard, the other seem to be normally distributed
# convert the degree of agree columns in the table into ordered categorical types
student_way=["talk_about_maths_with_friends", "help_friends_with_maths", "extracurricular_activity", "participate_in_competitions",
"study_more_than_2_extra_hours_a_day", "play_chess", "computer_programming", "participate_in_math_club"]
ordinal_var_dict = {}
for col in student_way:
ordinal_var_dict[col]=['Never or rarely', 'Sometimes', 'Often','Always or almost always']
for var in ordinal_var_dict:
ordered_var = pd.api.types.CategoricalDtype(ordered = True,
categories = ordinal_var_dict[var])
student_roles_df_clean[var] = student_roles_df_clean[var].astype(ordered_var)
# Let's explore teacher instructions
nrows=int(np.ceil(len(student_way)/2))
ncols=2
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize = [20,20])
default_color = sb.color_palette()[0]
for i in range(len(student_way)):
sb.countplot(data = student_roles_df_clean, x = student_way[i],
color = default_color, ax = ax[i//ncols,i%ncols])
plt.show()
students in this dataset does not participate in any activity related to the subject(math)
# convert the degree of agree columns in the table into ordered categorical types
student_orientation =["assigns_complex_projects","has_students_work_in_small_groups", "plans_classroom_activities",
"students_don’t_listen", "noise_and_disorder", "teacher_has_to_wait_until_its_quiet",
"students_don’t_work_well", "students_start_working_late"]
ordinal_var_dict = {}
for col in student_orientation:
ordinal_var_dict[col]=['Never or Hardly Ever', 'Some Lessons', 'Most Lessons','Every Lesson']
for var in ordinal_var_dict:
ordered_var = pd.api.types.CategoricalDtype(ordered = True,
categories = ordinal_var_dict[var])
student_roles_df_clean[var] = student_roles_df_clean[var].astype(ordered_var)
# Let's explore teacher instructions
nrows=int(np.ceil(len(student_orientation)/2))
ncols=2
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize = [20,20])
default_color = sb.color_palette()[0]
for i in range(len(student_orientation)):
sb.countplot(data = student_roles_df_clean, x = student_orientation[i],
color = default_color, ax = ax[i//ncols,i%ncols])
plt.show()
Students in this data doesn't work on complex or small projects
Students think that their fellow don't listen and make noise in some lessons, don't work well, start working late
# convert the degree of agree columns in the table into ordered categorical types
student_and_problem=["p_trace_steps", "Manual", "ask_a_friend"]
ordinal_var_dict = {}
for col in student_and_problem:
ordinal_var_dict[col]=['definitely not do this', 'probably not do this', 'probably do this','definitely do this']
for var in ordinal_var_dict:
ordered_var = pd.api.types.CategoricalDtype(ordered = True,
categories = ordinal_var_dict[var])
student_roles_df_clean[var] = student_roles_df_clean[var].astype(ordered_var)
# Let's explore teacher instructions
nrows=int(np.ceil(len(student_and_problem)/2))
ncols=2
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize = [20,10])
default_color = sb.color_palette()[0]
for i in range(len(student_and_problem)):
sb.countplot(data = student_roles_df_clean, x = student_and_problem[i],
color = default_color, ax = ax[i//ncols,i%ncols])
plt.show()
Students trends to trace the steps and ask friends when facing a problem
# let's plot all three together to get an idea of each ordinal variable's distribution.
nrows=5
ncols=3
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize = [20,20])
default_color = sb.color_palette()[0]
for i, col in enumerate(student_roles_df_clean.describe().columns):
sb.countplot(data = student_roles_df_clean, x = col, color = default_color, ax = ax[i%nrows,i%ncols])
plt.show()
The previous data shows how does the student react when facing a problem
just drive, leave it to brother and study map seem to be normally distributed
read brochure seem to be right skewed
The previous data also shows how does the students remeber the problem
Try buttos seem to be normally distributed
similarities, ask for helps and find ticket office seem to be right skewed
The study time out school
with parent, computer, homework, Guided Homework, Personal Tutor and Commercial Company seem to have much outliers and skewed may be log scale be usefule here
Actually all the study time variables are skewed let's visualize them alone
cols=["with_parent", "computer", "homework", "guided_homework", "personal_tutor", "commercial_company"]
nrows=3
ncols=2
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize = [20,20])
for i in range(len(cols)):
var = cols[i]
ax[i%nrows,i%ncols].hist(data = student_roles_df_clean, x = var, bins = 50)
ax[i%nrows,i%ncols].set_xlabel('{}'.format(var), size=14)
plt.show()
Still hard to observe, Let's plot each one alone with xlim to remove the long tails
# plot the time studing with parent
plt.figure(figsize=[12,8])
plt.hist(data=student_roles_df_clean, x="with_parent", bins=60);
plt.xticks(list(range(11)))
for i in range (10):
count = student_roles_df_clean.with_parent.value_counts()[i]
# Refer here for details of the text() - https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.text.html
plt.text(i, count+10000, count, ha = 'left', va='top')
plt.xlabel("Time studying with parent", size=14)
plt.xlim(0,10);
Students don't study much time with parents max 2 hours
# plot the time studing on computer
plt.figure(figsize=[12,8])
plt.hist(data=student_roles_df_clean, x="computer", bins=60);
plt.xticks(list(range(11)))
for i in range (10):
count = student_roles_df_clean.computer.value_counts()[i]
# Refer here for details of the text() - https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.text.html
plt.text(i, count+10000, count, ha = 'left', va='top')
plt.xlabel("Time studying on computer", size=14)
plt.xlim(0,10);
Also, most of them don't spend much time on computer to study max 2 hours
# plot the time to do the homework
plt.figure(figsize=[12,8])
sb.distplot(student_roles_df_clean.homework, bins=60, color = default_color);
plt.xticks(list(range(11)))
plt.xlabel("Time to do the homework", size=14)
plt.xlim(0,10);
The average time frequency is between 25000-50000 or 0.1-0.2 but the six hours seems to be something all students agree on it regardless their year.
# plot the time to do the guided_homework
plt.figure(figsize=[12,8])
sb.distplot(student_roles_df_clean.guided_homework, bins=60, color = default_color);
plt.xticks(list(range(11)))
plt.xlabel("Time to do the guided_homework", size=14)
plt.xlim(0,10);
The guided_homework helps students to spend less hour on it
# plot the time to do the personal_tutor
plt.figure(figsize=[12,8])
sb.histplot(student_roles_df_clean.personal_tutor, bins=60);
plt.xticks(list(range(5)))
plt.xlabel("Time to do the personal tutor", size=14)
plt.xlim(0,5);
The personal tutor is 0-2 only
# Let's explore teacher instructions
nrows=int(np.ceil(len(facilities_col)/2))
ncols=2
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize = [20,60])
default_color = sb.color_palette()[0]
for i in range(len(facilities_col)):
sb.countplot(data = facilities_df_clean, x = facilities_col[i],
color = default_color, ax = ax[i//ncols,i%ncols])
plt.show()
Most students have pc, laptop, internet, cell phone, video games and printer at home and school
Most popular number of device is three or more
Students in this data doesn't prefer reading
# Let's explore the gender
(pisa_df.ST04Q01.value_counts()/pisa_df.shape[0]).plot(kind="bar")
They are distributed equally without bias
Most popular grade is 1 and least one is 6, the grade will be less popular as it increases and that is normal because according to Test Levels 1 is worst and 6 is the best. Moreover, 1 may be high frequent because we have 1a and 1b
At agree_disagree_df_clean table I tried to change all to ordinal data Students thought they are perfect regardless as any good behaviour in the question agree and any bad behaviour disagree. They also worry about poor grade, exam difficulty
The student in this data are disciplined or they believe that.
The students in this data are social and have the feel of belonging or they believe that.
The studying times have log tails. So, I used plt.xlim to remove unwanted tail
def make_boxplot(var_list, width=20, height=40):
"""
Make boxplot for each variable alone with grade
var_list: list of our variables
"""
nrows=int(np.ceil(len(var_list)/2))
ncols=2
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize = [width,height])
default_color = sb.color_palette()[0]
for i in range(len(var_list)):
sb.boxplot(data = agree_disagree_df_clean, x = var_list[i], y=agree_disagree_df_clean.grade,
color = default_color, ax = ax[i//ncols,i%ncols])
plt.show()
def make_dist(var_list, table):
"""
Make a distribution for each variable alone with grade
var_list: list of our variables
"""
for var in var_list:
group_means = table.groupby([var]).mean()
group_order = group_means.sort_values(['grade'], ascending = False).index
g = sb.FacetGrid(data = table, col = var, col_wrap = 4, height = 4)
g.map(plt.hist, "grade", bins = np.arange(0, 6+0.5, 0.5))
g.set_ylabels(var, size=14)
g.set_xlabels("Grade", size=14)
g.set_titles('{col_name}', size=14)
Let's see the relation with grades
Let's start with agree_disagree_df_clean
# Let's see the relation with the grade
make_boxplot(student_ideas)
Seems all have the same range and IQR and all median=1 may be because his huge number or counts
Let's visualize the violin but we will remove grade 1
# Grades with no ones
grade_one=agree_disagree_df_clean[agree_disagree_df_clean.grade != 1]
# Let's see the relation with the grade
nrows=int(np.ceil(len(student_ideas)/2))
ncols=2
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize = [20,40])
default_color = sb.color_palette()[0]
for i in range(len(student_ideas)):
sb.violinplot(data=grade_one, x=student_ideas[i], y=grade_one.grade, color = default_color,
ax = ax[i//ncols,i%ncols]);
plt.show()
Indeed the median that wasn't appear in the box plot appears here and it is 1 and this is why it isn't appear
# Let's see the distribution for each one alone
make_dist(student_ideas, agree_disagree_df_clean)
From previous charts it has been clear that believes affects the grades
Good believes and self confident lead to high grades and bad believes leads to poor grades
# Let's see the relation with the grade
make_boxplot(student_effort)
It seems that the students in this data are lying as they didn't study but get higher grades or just students who study well, they may be because they are afraid of envy
# Let's see the distribution for each one alone
make_dist(student_effort, agree_disagree_df_clean)
From the previous graph it has been clear that more efforts means higher grades
# Let's see the relation with the grade
var=["get_along_with_teachers", "teachers_are_interested", "teachers_listen_to_students", "teachers_help_students",
"teachers_treat_students_fair", "feel_awkward_at_school", "liked_by_other_students"]
make_boxplot(var)
Students who think that they are liked by other get much higher grades than others
wether teachers treat them fair, listen to them, help them or teachers are interested doesn't matter
We can say that high grades are depend on the student him/herself
# Let's see the distribution for each one alone using faceting
make_dist(var, agree_disagree_df_clean)
Relation with teachers is poitive relation with grades and teachers have a great role in high grades
# Let's see the relation with the grade
var=["feel_like_outsider", "make_friends_easily", "belong_at_school"]
make_boxplot(var, 20, 10)
Students that feel outsider get higher marks
It doesn't matter whether you make friends easily or not and whether you belong to school or not
# Let's see the distribution for each one alone using faceting
make_dist(var, agree_disagree_df_clean)
Unlike the boxplot the outsiders get lower mark, belonging to school and making friends easily lead to high grades.
# Let's see the relation with the grade usin boxplot
var=["feel_lonely_at_school", "feel_happy_at_school", "things_are_ideal_at_school", "satisfied_at_school"]
make_boxplot(var, 20, 10)
Psychological state affect the grade
# Let's see the distribution for each one alone using faceting
make_dist(var, agree_disagree_df_clean)
Psychological state affect the grade
# Let's see the relation with the grade usin boxplot
var=["does_little_to_prepare_me_for_life", "waste_of_time", "gave_me_confidence", "useful_for_job",
"helps_to_get_a_job", "prepare_for_college", "enjoy_good_grades", "trying_hard_is_important",
"can_succeed_with_enough_effort_school", "my_choice_whether_i_will_be_good",
"problems_prevent_from_putting_effort_into_school", "different_teachers_would_make_me_try_harder",
"could_perform_well_if_i_wanted", "perform_poor_regardless"]
make_boxplot(var)
# Let's see the distribution for each one alone using faceting
make_dist(var, agree_disagree_df_clean)
Students who thought that school doesn't waste of time, gave him confident, useful for job, helps to get a job, tryig hard s important, high grades depends on him get high grades
# Let's see the relation with the grade usin boxplot
var=["useful_for_schoolwork", "homework_more_fun", "source_of_information", "troublesome", "not_suitable_for_schoolwork",
"too_unreliable"]
make_boxplot(var, 20, 10)
# Let's see the distribution for each one alone using faceting
make_dist(var, agree_disagree_df_clean)
Any good attitudes leads to good grades and any bad attitudes leads to bad grades
# Let's see the distribution for each one alone using faceting
make_dist(["parents_believe_studying_mathematics_is_important",
"parents_believe_mathematics_is_important_for_career",
"parents_like_mathematics"], agree_disagree_df_clean)
Parents beliefs affects students grades
wew, we have just finished agree_disagree_df_clean -the first table- let's go to the next one teacher_roles_df_clean
# Let's see the distribution for each one alone using faceting
make_dist(teacher_instructions, teacher_roles_df_clean)
Setting goals is important to get high grades and as you remind your self or your students increases the grades
Encourge to think and reasoning is poitive relation with grades as you encourge student more grades increases
giving feedback does not affect the grades very much
checks understanding affects the grades tremendously
summarizing affects the grades a bit
listing of how to get better increases the grades
# Let's see the distribution for each one alone using faceting
make_dist(teacher_role1, teacher_roles_df_clean)
Making students think help in increasing the grades but not always teachers sometimes need to help students
As teacher asks students for explainations increases grades increases
Making students apply what they learned help in increasing the grades but teachers should not do that always
# Let's see the distribution for each one alone using faceting
make_dist(teacher_support, teacher_roles_df_clean)
Again teacher's role is important
More homework means more grades
Let's go to the next one
# Let's see the distribution for each one alone using faceting
make_dist(student_confidence, student_roles_df_clean)
You needn't to be very confident just confident at most of the previous data to get high grades You prefered to be very confident at solving equation to get high grades as confident increases high grades increases
# Let's see the distribution for each one alone using faceting
make_dist(student_attribution_failure, student_roles_df_clean)
Students that thought that they aren't good at math likely to get high grades Students that thought that teachers slightly didn't explain well more likely to get high grades Students that form bad guesses doesn't affect the garde Students that thought that material get the highest mark teacher didn't get students interested doesn't affect the grades Students that thought doesn't affect the grades
# Let's see the distribution for each one alone using faceting
make_dist(student_way, student_roles_df_clean)
Good grades doesn't depends on participating in activity related to subject
# Let's see the distribution for each one alone using faceting
make_dist(["important_parts_vs._existing_knowledge_vs._learn_by_heart", "improve_understanding_vs._new_ways_vs._memory",
"other_subjects_vs._learning_goals_vs._rehearse_problems", "repeat_examples_vs._everyday_applications_vs._more_information"], student_roles_df_clean)
You should study the most important, improve understanding, put a learning goal and finally repeat examples
# Let's see the distribution for each one alone using faceting
make_dist(student_orientation, student_roles_df_clean)
Students who never assign complex or small projects tends to get high grades
# Let's see the distribution for each one alone using faceting
make_dist(student_and_problem, student_roles_df_clean)
Traceing the problem and asking friends leads to high grades
# plot matrix: sample 500 student_roles_df_clean so that plots are clearer and
# they render faster
samples = np.random.choice(student_roles_df_clean.shape[0], 500, replace = False)
student_samp = student_roles_df_clean.iloc[samples]
g = sb.pairplot(data = student_samp, vars = list(student_roles_df_clean.describe().columns[:5]))
# plot matrix: sample 500 student_roles_df_clean so that plots are clearer and
# they render faster
sb.pairplot(data = student_samp, vars = list(student_roles_df_clean.describe().columns[5:10]))
# plot matrix: sample 500 student_roles_df_clean so that plots are clearer and
# they render faster
sb.pairplot(data = student_samp, vars = list(student_roles_df_clean.describe().columns[10:]))
# correlation plot of student
plt.figure(figsize = [20, 10])
sb.heatmap(student_roles_df_clean[student_roles_df_clean.describe().columns].corr(), annot = True, fmt = '.3f',
cmap = 'vlag_r', center = 0)
plt.show()
The correlation is weak
make_dist(facilities_col[1:], facilities_df_clean)
Facility helps to get high grades, but the urgent one only (internet, pc, laptop, cell phone) Reading doesn't affects the grades as well as number of books at home
group_means = pisa_df.groupby(["ST04Q01"]).mean()
group_order = group_means.sort_values(['ST02Q01'], ascending = False).index
g = sb.FacetGrid(data = pisa_df, col = "ST04Q01", col_wrap = 4, height = 4)
g.map(plt.hist, "ST02Q01", bins = np.arange(0, 6+0.5, 0.5))
g.set_ylabels("Count", size=14)
g.set_xlabels("Grade", size=14)
g.set_titles('{col_name}', size=14)
No bias between gender
sb.countplot(data = pisa_df, x = 'ST04Q01', hue = 'ST02Q01', color=default_color)
# Set label
plt.xlabel("Grade", size=14);
plt.ylabel("Count", size=14);
# Set title
plt.title("Relation between Grade and Gender");
Gender doesn't affect the grade
Parents beliefs affects students grades
You should study the most important, improve understanding, put a learning goal and finally repeat examples
Most of the data are categorical and the numerical one has weak relation with grade so they won't be explored further.There was a relation between student efforts and grades more efforts appear to have slightly better grades
Create plots of three or more variables to investigate your data even further. Make sure that your investigations are justified, and follow from your work in the previous sections.
# Let's draw realtion in heatmap
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
cat_means = agree_disagree_df_clean.groupby(['homework_completed_in_time', 'work_hard_on_homework']).mean()['grade']
cat_means = cat_means.reset_index(name = 'grade_avg')
cat_means = cat_means.pivot(index = 'work_hard_on_homework', columns = 'homework_completed_in_time',
values = 'grade_avg')
sb.heatmap(cat_means, annot = True, fmt = '.3f',
cbar_kws = {'label' : 'mean(grade)'});
Students who completed HW on time but doesn't work hard on it get the highest mark.
Let's see the relation between taking hw everyday and completing in time
Since, they are from different tables we will use pisa__df to use both
# Let's order them first
var=["Strongly disagree", "Disagree", "Agree", "Strongly agree"]
cols=["ST46Q01", "ST46Q02", "ST46Q03", "ST46Q04", "ST46Q05", "ST46Q06", "ST46Q07", "ST46Q08", "ST46Q09",
"ST82Q01", "ST82Q02", "ST82Q03", "ST83Q01"]
for col in cols:
ordered_var = pd.api.types.CategoricalDtype(ordered = True,
categories = var)
pisa_df[col] = pisa_df[col].astype(ordered_var)
# Let's draw realtion in bar chart
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.barplot(data = pisa_df, x = 'ST46Q01', y = 'ST02Q01', hue = 'ST82Q01')
# Set the legend
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Homework everyday back in time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Homework completed in time");
Student who complete homework in time with teacher giving homework everyday back in time gets higher grades.
# Let's draw realtion in heatmap
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
cat_means = pisa_df.groupby(['ST46Q01', 'ST82Q01']).mean()['ST02Q01']
cat_means = cat_means.reset_index(name = 'grade_avg')
cat_means = cat_means.pivot(index = 'ST82Q01', columns = 'ST46Q01',
values = 'grade_avg')
sb.heatmap(cat_means, annot = True, fmt = '.3f',
cbar_kws = {'label' : 'mean(grade)'});
# Set the labels
plt.ylabel("Homework everyday back in time")
plt.xlabel("Homework completed in time");
Teacher who gives H.W. everyday back in time with student complete it in time get higher grades
# Let's draw realtion in pointplot
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.pointplot(data = pisa_df, x = 'ST46Q01', y = 'ST02Q01', hue = 'ST82Q01',
dodge = 0.3, linestyles = "")
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Homework everyday back in time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Homework completed in time");
From the previous chart it is obvious that students who take Hw everyday and complete it in time get the highest grade
# Let's draw realtion in pointplot
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.pointplot(data = pisa_df, x = 'ST46Q01', y = 'ST02Q01', hue = 'ST82Q02',
dodge = 0.3, linestyles = "")
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Homework Once a Week/Back in Time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Homework completed in time");
Homework once a week back in time get higher grades than everyday
Student who didn't complete homework even took it once a week not back in time or not get an average grade
# Let's draw realtion in bar chart
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.barplot(data = pisa_df, x = 'ST46Q01', y = 'ST02Q01', hue = 'ST82Q02')
# Set the legend
ax.legend(loc = 8, ncol = 4, framealpha = 20, title = 'Homework Once a Week/Back in Time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Homework completed in time");
Student who complete homework in time with teacher giving homework once a week back in time gets higher grades.
# Let's draw realtion in heatmap
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
cat_means = pisa_df.groupby(['ST46Q01', 'ST82Q02']).mean()['ST02Q01']
cat_means = cat_means.reset_index(name = 'grade_avg')
cat_means = cat_means.pivot(index = 'ST82Q02', columns = 'ST46Q01',
values = 'grade_avg')
sb.heatmap(cat_means, annot = True, fmt = '.3f',
cbar_kws = {'label' : 'mean(grade)'});
# Set the labels
plt.ylabel("Homework Once a Week/Back in Time")
plt.xlabel("Homework completed in time");
There is a strong correlation between HW/Week back in time and high grade wheather complete it or not
# Let's draw realtion in heatmap
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
cat_means = pisa_df.groupby(['ST46Q01', 'ST82Q03']).mean()['ST02Q01']
cat_means = cat_means.reset_index(name = 'grade_avg')
cat_means = cat_means.pivot(index = 'ST82Q03', columns = 'ST46Q01',
values = 'grade_avg')
sb.heatmap(cat_means, annot = True, fmt = '.3f',
cbar_kws = {'label' : 'mean(grade)'});
# Set the labels
plt.ylabel("Homework Once a Week/Not Back in Time ")
plt.xlabel("Homework completed in time");
There is a strong correlation between students who doesn't take HW/Week not back in time or took it but not always or back in time and high grade wheather complete it or not but if the student complete it the grade increases.
# Let's draw realtion in bar chart
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.barplot(data = pisa_df, x = 'ST46Q01', y = 'ST02Q01', hue = 'ST82Q03')
# Set the legend
ax.legend(loc = 8, ncol = 4, framealpha = 20, title = 'Homework Once a Week/Not Back in Time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Homework completed in time");
Student who complete homework in time with teacher giving homework once a week not back in time gets lower grades.
# Let's draw realtion in pointplot
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.pointplot(data = pisa_df, x = 'ST46Q01', y = 'ST02Q01', hue = 'ST82Q03',
dodge = 0.3, linestyles = "")
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Homework Once a Week/Not Back in Time ');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Homework completed in time")
Student who didn't take homework once a week not back in time and complete it or not get high grade but who complete it get higher grade
# Let's draw realtion in heatmap
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
cat_means = pisa_df.groupby(['ST46Q01', 'ST83Q01']).mean()['ST02Q01']
cat_means = cat_means.reset_index(name = 'grade_avg')
cat_means = cat_means.pivot(index = 'ST83Q01', columns = 'ST46Q01',
values = 'grade_avg')
sb.heatmap(cat_means, annot = True, fmt = '.3f',
cbar_kws = {'label' : 'mean(grade)'});
# Set the labels
plt.ylabel("Lets Us Know We Have to Work Hard")
plt.xlabel("Homework completed in time");
Teacher who didn't let student know that they should to work hard with the student who completed homework in time get highest grades
# Let's draw realtion in pointplot
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.pointplot(data = pisa_df, x = 'ST46Q01', y = 'ST02Q01', hue = 'ST83Q01',
dodge = 0.3, linestyles = "")
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Lets Us Know We Have to Work Hard');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Homework completed in time")
Teacher who let student know that they should to work hard get the worst grade and teacher who didn't make them get highest grades.
# Let's draw realtion in bar chart
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.barplot(data = pisa_df, x = 'ST46Q02', y = 'ST02Q01', hue = 'ST82Q01')
# Set the legend
ax.legend(loc = 8, ncol = 4, framealpha = 20, title = 'Homework everyday/Back in Time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("works hard on homework");
Student who doesn't work hard on homework gets highest mark whether took homework everyday or not
# Let's draw realtion in heatmap
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
cat_means = pisa_df.groupby(['ST46Q02', 'ST82Q01']).mean()['ST02Q01']
cat_means = cat_means.reset_index(name = 'grade_avg')
cat_means = cat_means.pivot(index = 'ST82Q01', columns = 'ST46Q02',
values = 'grade_avg')
sb.heatmap(cat_means, annot = True, fmt = '.3f',
cbar_kws = {'label' : 'mean(grade)'});
# Set the labels
plt.ylabel("Homework everyday/Back in Time")
plt.xlabel("works hard on homework");
As student works harder on the homework get lower mark
# Let's draw realtion in bar chart
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.barplot(data = pisa_df, x = 'ST46Q03', y = 'ST02Q01', hue = 'ST82Q01')
# Set the legend
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Homework everyday back in time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Prepared for Exams");
Student who prepare for exam with teacher giving homework everyday back in time gets higher grades.
# Let's draw realtion in heatmap
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
cat_means = pisa_df.groupby(['ST46Q03', 'ST82Q01']).mean()['ST02Q01']
cat_means = cat_means.reset_index(name = 'grade_avg')
cat_means = cat_means.pivot(index = 'ST82Q01', columns = 'ST46Q03',
values = 'grade_avg')
sb.heatmap(cat_means, annot = True, fmt = '.3f',
cbar_kws = {'label' : 'mean(grade)'});
# Set the labels
plt.ylabel("Homework everyday back in time")
plt.xlabel("Prepared for Exams");
Students who took homework everyday back in time got higher grades regardless prepared for exam or not
# Let's draw realtion in bar chart
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.barplot(data = pisa_df, x = 'ST46Q03', y = 'ST02Q01', hue = 'ST82Q02')
# Set the legend
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Homework once a week back in time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Prepared for Exams");
Student who prepares for exams with teacher giving homework once a week back in time gets higher grades.
# Let's draw realtion in heatmap
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
cat_means = pisa_df.groupby(['ST46Q03', 'ST82Q02']).mean()['ST02Q01']
cat_means = cat_means.reset_index(name = 'grade_avg')
cat_means = cat_means.pivot(index = 'ST82Q02', columns = 'ST46Q03',
values = 'grade_avg')
sb.heatmap(cat_means, annot = True, fmt = '.3f',
cbar_kws = {'label' : 'mean(grade)'});
# Set the labels
plt.ylabel("Homework once a week back in time")
plt.xlabel("Prepared for Exams");
Students who took homework once a week back in time got higher grades as they deosn't prepare for exam
# Let's draw realtion in pointplot
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.pointplot(data = pisa_df, x = 'ST46Q03', y = 'ST02Q01', hue = 'ST82Q02',
dodge = 0.3, linestyles = "")
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Homework once a week back in time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Prepared for Exams")
Students who prepared for exams got high marks regardless took homework once a week or not
# Let's draw realtion in bar chart
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.barplot(data = pisa_df, x = 'ST46Q03', y = 'ST02Q01', hue = 'ST82Q03')
# Set the legend
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Homework once a week Not back in time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Prepared for Exams");
Student who prepare for exam with teacher giving homework everyday not back in time gets lower grades.
# Let's draw realtion in heatmap
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
cat_means = pisa_df.groupby(['ST46Q03', 'ST82Q03']).mean()['ST02Q01']
cat_means = cat_means.reset_index(name = 'grade_avg')
cat_means = cat_means.pivot(index = 'ST82Q03', columns = 'ST46Q03',
values = 'grade_avg')
sb.heatmap(cat_means, annot = True, fmt = '.3f',
cbar_kws = {'label' : 'mean(grade)'});
# Set the labels
plt.ylabel("Homework once a week Not back in time")
plt.xlabel("Prepared for Exams");
Students who often took homework everyday not back in time got higher grades regardless prepared for exam or not
# Let's draw realtion in pointplot
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.pointplot(data = pisa_df, x = 'ST46Q03', y = 'ST02Q01', hue = 'ST82Q03',
dodge = 0.3, linestyles = "")
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Homework once a week Not back in time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Prepared for Exams");
Students who not always or never took homework once a week got higher grades regardless prepared for exams or not
# Let's draw realtion in bar chart
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.barplot(data = pisa_df, x = 'ST46Q03', y = 'ST02Q01', hue = 'ST83Q01')
# Set the legend
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Lets Us Know We Have to Work Hard');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Prepared for Exams");
Student who prepare for exam with teacher lets them Know they have to work hard gets lower grades.
# Let's draw realtion in heatmap
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
cat_means = pisa_df.groupby(['ST46Q03', 'ST83Q01']).mean()['ST02Q01']
cat_means = cat_means.reset_index(name = 'grade_avg')
cat_means = cat_means.pivot(index = 'ST83Q01', columns = 'ST46Q03',
values = 'grade_avg')
sb.heatmap(cat_means, annot = True, fmt = '.3f',
cbar_kws = {'label' : 'mean(grade)'});
# Set the labels
plt.ylabel("Lets Us Know We Have to Work Hard")
plt.xlabel("Prepared for Exams");
Students whose teacher lets them Know they have to work hard and don't prepared for exams got higher grades
# Let's draw realtion in pointplot
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.pointplot(data = pisa_df, x = 'ST46Q03', y = 'ST02Q01', hue = 'ST83Q01',
dodge = 0.3, linestyles = "")
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Lets Us Know We Have to Work Hard');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Prepared for Exams");
Teachers that don't lets student Know that they have to work hard their students get higher grades regardless prepared for exams or not
# Let's draw realtion in bar chart
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.barplot(data = pisa_df, x = 'ST46Q04', y = 'ST02Q01', hue = 'ST82Q01')
# Set the legend
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Homework everyday back in time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Study hard for quizzes");
Teachers who gives homework everyday leads to high grades regardless students study hard for quiz or not
# Let's draw realtion in heatmap
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
cat_means = pisa_df.groupby(['ST46Q04', 'ST82Q01']).mean()['ST02Q01']
cat_means = cat_means.reset_index(name = 'grade_avg')
cat_means = cat_means.pivot(index = 'ST82Q01', columns = 'ST46Q04',
values = 'grade_avg')
sb.heatmap(cat_means, annot = True, fmt = '.3f',
cbar_kws = {'label' : 'mean(grade)'});
# Set the labels
plt.ylabel("Homework everyday back in time")
plt.xlabel("Study hard for quizzes");
High grades is positive correlation with doesn't study hard
# Let's draw realtion in pointplot
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.pointplot(data = pisa_df, x = 'ST46Q04', y = 'ST02Q01', hue = 'ST82Q01',
dodge = 0.3, linestyles = "")
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Homework everyday back in time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Study hard for quizzes");
Students who study hard got higher grades and in the heatmap the correlation may be because at students who don't study hard and got the highest grades this may be an outliers
# Let's draw realtion in bar chart
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.barplot(data = pisa_df, x = 'ST46Q05', y = 'ST02Q01', hue = 'ST82Q01')
# Set the legend
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Homework everyday back in time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Study until I understand everything");
Teachers who gives homework everyday leads to high grades regardless students study hard for quiz or not
# Let's draw realtion in heatmap
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
cat_means = pisa_df.groupby(['ST46Q05', 'ST82Q01']).mean()['ST02Q01']
cat_means = cat_means.reset_index(name = 'grade_avg')
cat_means = cat_means.pivot(index = 'ST82Q01', columns = 'ST46Q05',
values = 'grade_avg')
sb.heatmap(cat_means, annot = True, fmt = '.3f',
cbar_kws = {'label' : 'mean(grade)'});
# Set the labels
plt.ylabel("Homework everyday back in time")
plt.xlabel("Study until I understand everything");
High grades is positive correlation with doesn't study until they understand
# Let's draw realtion in pointplot
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.pointplot(data = pisa_df, x = 'ST46Q05', y = 'ST02Q01', hue = 'ST82Q01',
dodge = 0.3, linestyles = "")
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Homework everyday back in time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Study until I understand everything");
Students who study until they understand got higher grades and in the heatmap the correlation may be because at students who don't study until they understand and got the highest grades this may be an outliers
# Let's draw realtion in bar chart
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.barplot(data = pisa_df, x = 'ST46Q06', y = 'ST02Q01', hue = 'ST82Q01')
# Set the legend
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Homework everyday back in time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Pay attention in classes");
Teachers who gives homework everyday leads to high grades regardless students study hard for quiz or not
# Let's draw realtion in heatmap
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
cat_means = pisa_df.groupby(['ST46Q06', 'ST82Q01']).mean()['ST02Q01']
cat_means = cat_means.reset_index(name = 'grade_avg')
cat_means = cat_means.pivot(index = 'ST82Q01', columns = 'ST46Q06',
values = 'grade_avg')
sb.heatmap(cat_means, annot = True, fmt = '.3f',
cbar_kws = {'label' : 'mean(grade)'});
# Set the labels
plt.ylabel("Homework everyday back in time")
plt.xlabel("Pay attention in classes");
High grades is positive correlation with doesn't study until they understand
# Let's draw realtion in pointplot
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.pointplot(data = pisa_df, x = 'ST46Q06', y = 'ST02Q01', hue = 'ST82Q01',
dodge = 0.3, linestyles = "")
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Homework everyday back in time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Pay attention in classes");
Students who are paying attention in classes got higher grades and in the heatmap the correlation may be because at students who are paying attention and got the highest grades this may be an outliers
# Let's draw realtion in bar chart
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.barplot(data = pisa_df, x = 'ST46Q07', y = 'ST02Q01', hue = 'ST82Q01')
# Set the legend
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Homework everyday back in time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Listen in classes");
Teachers who gives homework every day get leads to high grades regardless students study hard for quiz or not
# Let's draw realtion in heatmap
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
cat_means = pisa_df.groupby(['ST46Q07', 'ST82Q01']).mean()['ST02Q01']
cat_means = cat_means.reset_index(name = 'grade_avg')
cat_means = cat_means.pivot(index = 'ST82Q01', columns = 'ST46Q07',
values = 'grade_avg')
sb.heatmap(cat_means, annot = True, fmt = '.3f',
cbar_kws = {'label' : 'mean(grade)'});
# Set the labels
plt.ylabel("Homework everyday back in time")
plt.xlabel("Listen in classes");
High grades is positive correlation with homework everyday
# Let's draw realtion in pointplot
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.pointplot(data = pisa_df, x = 'ST46Q07', y = 'ST02Q01', hue = 'ST82Q01',
dodge = 0.3, linestyles = "")
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Homework everyday back in time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Listen in classes");
Students who listen in class got higher grades and in the heatmap the correlation may be because at students who don't listen in class and got the highest grades this may be an outliers
# Let's draw realtion in bar chart
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.barplot(data = pisa_df, x = 'ST46Q08', y = 'ST02Q01', hue = 'ST82Q01')
# Set the legend
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Homework everyday back in time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Avoid distractions when studying");
Teachers who gives homework everyday leads to high grades regardless students avoid distractions or not
# Let's draw realtion in heatmap
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
cat_means = pisa_df.groupby(['ST46Q08', 'ST82Q01']).mean()['ST02Q01']
cat_means = cat_means.reset_index(name = 'grade_avg')
cat_means = cat_means.pivot(index = 'ST82Q01', columns = 'ST46Q08',
values = 'grade_avg')
sb.heatmap(cat_means, annot = True, fmt = '.3f',
cbar_kws = {'label' : 'mean(grade)'});
# Set the labels
plt.ylabel("Homework everyday back in time")
plt.xlabel("Avoid distractions when studying");
High grades is positive correlation with doesn't avoiding distractions when studying
# Let's draw realtion in pointplot
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.pointplot(data = pisa_df, x = 'ST46Q08', y = 'ST02Q01', hue = 'ST82Q01',
dodge = 0.3, linestyles = "")
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Homework everyday back in time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Avoid distractions when studying");
Students who avoid distractions when studying got higher grades and in the heatmap the correlation may be because at students who don't avoid distractions and got the highest grades this may be an outliers
# Let's draw realtion in bar chart
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.barplot(data = pisa_df, x = 'ST46Q09', y = 'ST02Q01', hue = 'ST82Q01')
# Set the legend
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Homework everyday back in time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Keep work organized");
Teachers who gives homework every day get higher grades regardless students keep work organized or not
# Let's draw realtion in heatmap
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
cat_means = pisa_df.groupby(['ST46Q09', 'ST82Q01']).mean()['ST02Q01']
cat_means = cat_means.reset_index(name = 'grade_avg')
cat_means = cat_means.pivot(index = 'ST82Q01', columns = 'ST46Q09',
values = 'grade_avg')
sb.heatmap(cat_means, annot = True, fmt = '.3f',
cbar_kws = {'label' : 'mean(grade)'});
# Set the labels
plt.ylabel("Homework everyday back in time")
plt.xlabel("Keep work organized");
High grades is positive correlation with student who isn't organized
# Let's draw realtion in pointplot
# Change figure size to be bigger
fig=plt.figure(figsize=[10, 5])
ax = sb.pointplot(data = pisa_df, x = 'ST46Q09', y = 'ST02Q01', hue = 'ST82Q01',
dodge = 0.3, linestyles = "")
ax.legend(loc = 8, ncol = 4, framealpha = 1, title = 'Homework everyday back in time');
# Set the labels
ax.set_ylabel("Avg Grade")
ax.set_xlabel("Keep work organized");
Students who keep work organized got higher grades and in the heatmap the correlation may be because at students who don't keep work organized and got the highest grades this may be an outliers
As assignment increases the grade increases but if it increases too much the grade starts to decrease, also teacher who let student know that they should to work hard get the worst grade and teacher who didn't make them get highest grades.
# Let's save our cleaned data
agree_disagree_df_clean.to_csv("agree_disagree_df_clean.csv")
teacher_roles_df_clean.to_csv("teacher_roles_df_clean.csv")
student_roles_df_clean.to_csv("student_roles_df_clean.csv")
facilities_df_clean.to_csv("facilities_df_clean.csv")